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ABSTRACT 

We investigate how different preservation policies ranging 
from least aggressive to Most aggressive affect the level of 
preservation achieved by autonomic processes used by smart 
digital objects (DOs). The mechanisms used to support 
preservation across different hosts can be used for auto- 
matic link generation and support preservation activities by 
moving data preservation from an archive centric perspec- 
tive to a data centric preservation. Based on simulations of 
small- world graphs of DOs created using the Unsupervised 
Small- World algorithm, we report quantitative and qualita- 
tive results for graphs ranging in size from 10 to 5,000 DOs. 
Our results show that a Most aggressive preservation pol- 
icy makes the best use of distributed host resources while 
using one half of the number of messages of a Moderately 
aggressive preservation policy. 

Categories and Subject Descriptors 

H.3.7 [Information Storage and Retrieval]: Systems is- 



General Terms 

Algorithms, Design, Experimentation 

1. MOTIVATION 

Much of our current cultural heritage exists only in dig- 
ital format and digital preservation approaches rely on the 
long term commitment of individuals, institutions and com- 
panies to preserve this heritage. The length of time that 
an individual will be engaged in preservation activities is, 
by definition, limited to their lifetime (and probably just 
the middle part of that life). Even those few years may be 
longer than institutions and companies would be willing to 
undertake digital preservation. Institutions and companies 
may cease to exist or be unwilling or unable to meet their 
original preservation commitments due to changes in corpo- 
rate culture or financial considerations. If this happens then 
the digital files and their information (our heritage) may be- 
come irretrievably lost. The acknowledgement that much of 
our heritage exists only in digital format, and the recogni- 
tion that there is a real risk of total loss through accident 
[H] or change in business goals [23] has been recognized in 
academic reports and papers [23] and is starting to surface 
in the popular press [371 HOl IM fT6] . 

Our motivation is to change the focus from preservation 
services administered by institutions (a repository-centric 



perspective) to one where the data preserves itself (a data- 
centric perspective). We continue to investigate this data- 
centric perspective through the use of the Unsupervised Small- 
World (USW) graph creation algorithm [S] |31 [3 [6] where we 
have shown that DOs instrumented with just a few rules 
can autonomously form into small- world graphs. The fo- 
cus of this work is to augment the prior work by imbuing 
DOs with the capability to create a number of copies of 
themselves for preservation purposes. We are focusing on 
determining when copies should be created during the USW 
process and the communication impacts of different preser- 
vation policies. 

2. RELATED WORK 

This work is at the convergence of digital library reposi- 
tories, emergent behavior, graph theory and web infrastruc- 
ture. To provide a context for understanding the contribu- 
tions of this research, we first briefly review the status of 
how objects are stored in repositories as well as the nature 
and types of various networks or graphs. 

2.1 Repositories 

Repositories range from theoretical to ready-to-download. 
Some such as SAV 9 are frameworks or architectural pro- 
posals. Some, like FEDORA 26i, are middle- ware systems, 
ready to be the core repository technology in a local deploy- 
ment. Some such as aDORe [36] are complete systems, ready 
to deploy. These include DSpace [35], sponsored by MIT and 
HP Laboratories and LOCKSS [22], sponsored by the Stan- 
ford University Libraries. All are widely implemented and 
enjoy a large user community. DSpace is an institutional 
repository, intended to archive the intellectual output of a 
university's faculty and students. LOCKSS allows libraries 
to create "dark archives" of publishers' websites. As long as 
the publishers' websites are available, all web traffic goes to 
those sites. But if the publishers' contents are lost, the dark 
archives are activated and the content is available again. 
Risk is mitigated through many sites archiving content of 
their own choosing. Depending on an institution's require- 
ments, the systems described above can be quite attractive. 
But there is an implicit assumption on any repository sys- 
tem: that there is a person, community or institution that 
exists to tend to the repository. What happens when the 
responsible organization no longer exists? There are reposi- 
tory trading and synchronization provisions (e.g., [lOl), but 
most are specific to a particular repository architecture. 

Cooperative File Systems (CFS) 11 , Internet Backplane 
Protocol (IBP) ^, Storage Resource Broker (SRB) ^ and 
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OceanStore [31] are among several generic network storage 
systems and APIs that have also been proposed. CPS and 
OceanStore rely on distributed hash tables and an overlay 
network to locate content in the Internet. Systems with 
such additional levels of shared infrastructure have not been 
widely deployed. IBP and SRB are more traditional in 
their repository design and have enjoyed greater deployment. 
SRB (and its follow-on, iRODs 28 ) has a user community 
similar in size to LOCKSS and Pedora. 

Numerous P2P repositories have also been proposed (for 
example Intermemory 15 , Preenet 8 , Pree Haven 12 , and 
PAST [33 ). Prequently these repositories are characterized 
by offering long-term storage systems requiring the contri- 
bution of X megabytes of storage today for the promise of 
Y megabytes of persistent storage (X ^ Y). Despite having 
many theoretically attractive features, these systems have 
not found widespread acceptance. We use a variant of this 
idea in our graph construction techniques, by simulating 
that a host has effectively infinite capacity for those DOs 
that are created locally and a very limited capacity for those 
DOs that were created remotely. 

Each of the approaches listed above inherently rely on hu- 
man and institution intervention in the digital preservation 
activities of refreshing and migration 38, 32 . The digital 
preservation activities of emulation and metadata attach- 
ment are outside the context of this paper. As the amount 
of digital data continues to grow (at potentially an expo- 
nential rate), the organizational and human cost to keep 
up with traditional approaches will become overwhelming. 
An alternative approach is to revisit the definition of a DO 
and to incorporate into that definition the idea that the DO 
is empowered to make preservation copies of itself for the 
purposes of preservation and that it can communicate with 
other DOs. Messages that can be sent include the location of 
new supporting preservation hosts, data migration services 
and new DOs. 

2.2 Graph Construction 

Our approach for the construction of a small-world net- 
work of DOs for self preservation is different than others 
have used or proposed. We make use of the definition of 
a small-world graph as one that has a high clustering co- 
efficient when compared to a randomly created graph and 
an average path length that is proportional to the number 
of nodes in the graph 39 . The Watts- St rogatz approach to 
constructing such a graph is to take a lattice graph of degree 
k and size n and perturb the links to create a graph with 
small- world characteristics. Some approaches make connec- 
tions between nodes based on the proportion of the desti- 
nation node's degree count [251 12Q[ [1]. a kind of preferential 
attachment or fitness policy. Yet another type of approach 
takes an existing graph and then grows a small- world by the 
addition of new links ^131 119j . Or, by connecting a node to 
a fixed number of vertices based on their degree [3] , or even 
creating a small- world graph from a random one [14j . 

The USW process requires that each new node commu- 
nicate with an existing node in the USW graph. After the 
first DO selection, the USW algorithm controls where the 
DO fits into the graph and how many edges are created to 
other DOs in the system. USW is the only small- world graph 
creation algorithm that we know of where connections are 
made between DOs based information that the DO gleans 
prior to making its first connection. 



3. SELF-PRESERVING DIGITAL 
OBJECTS 

We consider DOs to be in the tradition of Kahn-Wilensky 
and related implementations [18 . This paper focuses on 
the analysis of inter- and intra-DO policies for preservation 
through simulation. In a separate project we are implement- 
ing a test bed of DOs as web resources that use OAI-ORE 
[2T] Resource Maps to keep track of the contents of DOs, the 
location of supporting web services, and the JavaScript nec- 
essary to implement the policies presented here. The test 
bed will feature DOs that utilize a variety of storage lay- 
ers, such as repository systems (e.g., DSpace, PEDORA), 
file systems, web storage services (e.g., Amazon S3), wikis, 
blogs, and email accounts (e.g., Gmail). 

3.1 Flocking for Preservation 

Craig Reynolds' seminal paper on "boids" [30], demon- 
strated that three simple rules were sufficient to simulate the 
complex behaviors of schools of fish, fiocks of birds, herds of 
animals and the like. The remarkable feature about these 
rules is that they are scale-free so knowing the entire size 
of the group, or network is not required. We believe these 
rules can be adapted to create self-preserving DOs with sim- 
ilarly complex emergent behaviors. The transcription of 
Reynolds' rules from a bold to a DO perspective are: 

Collision avoidance DOs fiocking to a new repository can- 
not overwrite each other (collide in physical storage), nor 
collide in namespaces (have the same URI). This is orthog- 
onal to the naming mechanism used: URIs, URN handles, 
DOIs, globally unique identifiers (QUIDS) or content ad- 
dressable naming schemes 29 . 

Velocity matching All members of a herd, or school, or 
fiock move at roughly the same speed. With boids, the idea 
is to travel the same speed as your neighbors. Interpreting 
velocity as resource consumption (i.e., storage space) enables 
this rule to be applied to a DO environment. Specifically, 
a DO should try to consume as much, and only as much, 
storage as everyone else. In resource-rich environments (lots 
of storage space available on lots of hosts) , making as many 
copies of yourself as you would like is easy. When storage 
becomes scarce, this becomes more difficult. DOs must be 
able to delete copies of themselves from different repositories 
to make room for late arriving DOs in low-storage situations. 
DOs will never delete the last copy of themselves to make 
room for new DOs, but they will delete copies of themselves 
to come down from a soft threshold (e.g., 10 copies) down to 
a hard threshold (e.g., 3). When resources become plentiful 
again, new copies can be made. 

Flock centering Por boids this means staying near (but 
not colliding with) other fiock-mates. We interpret this in a 
manner similar to velocity matching, with DOs attempting 
to stay near other DOs as they make copies of themselves 
at new repositories. In essence, when a DO learns of a new 
repository and makes a copy of itself there, it should tell 
the other DOs it knows so they will have the opportunity to 
make copies of themselves at the new location. Announcing 
the location of a new repository will thus cause DOs at other 
repositories that have not reached their soft threshold to 
create copies that "fiow" to the new repository. 

The benefits of using the boids model are: it is simple 
to implement and test (cf., iRODs); all decisions are made 
using locally gleaned information; there are no global con- 
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(a) DOi,o,* (b) D02,o,* con- (c) D02,o,* link 
tact 




(d) D03,o,* first (e) D03,o,* sec- (f) D03,o,* first 
contact ond contact link 




(g) D04,o,* first (h) D04,o,* sec- (i) D04,o,* first 
contact ond contact link 



Figure 1: The USW growth algorithm with 4 DOs. 
The "wandering" DO symbol is filled. Dashed lines 
are communications. Solid lines are friendship links. 

trols with the attendant communications overhead costs; 
and once a DO is created and introduced into the USW net- 
work, the DO is responsible for its destiny. Simple rules that 
are executed based on locally gleaned information result in 
emergent intelligent and social behaviors. 

At the macro level; in much the same way that flocks self- 
navigate to new locations that have the resources they need, 
we envision DOs self-preserving in a loose confederation of 
cooperating repositories each with varying levels of resources 
and availability. Making copies in new repositories is per- 
formed in an opportunistic manner, within the guidelines 
imbued in the DOs at creation time. From time to time an 
archivist may steer the entire collection (or parts of it) to 
new archives, but for the most part the DOs replicate and 
preserve themselves. 

3.2 Unsupervised Small-World Graph Cre- 
ation 

We introduce some terminology to discuss how DOs can 
self-arrange. Friends are DOs that share an edge. When a 
DO is created, is introduced to an existing DO in the graph 
and is called a wandering DO. While wandering, a DO ac- 
cumulates a list of potential friends from other DOs in the 
graph. When a wandering DO makes its first friendship link 
to a DO, the no-longer wandering DO uses the information 
that it has gleaned about other DOs to create additional 
friendship links. This process with 4 DOs is shown in Fig- 
ure [1] Friendship links are separate from HTML navigation 
links (i.e., <link> instead of <a> HTML elements). A fam- 
ily is the collection of DOs that are replicas of each other. 
A parent is the family member responsible for meeting the 
family's preservation goals. 



Friendship links serve as a way for DOs to send messages 
from one to another, such as when new storage locations 
are available or the scope and migration of file formats (cf. 
the semi- automated alert system described in Panic [E]). 
Friendship links are used to support the preservation process 
and meet the spirit of preservation refreshing. 

4. SELF-PRESERVATION POLICIES FOR 
PRESERVATION 

4.1 Model 

We simulate three different replication policies to quantify 
and qualify their effects on the system as measured in two 
different areas. The first area being how effective the repli- 
cation policy is at having as many DOs as possible achieve 
their desired maximum number of preservation copies. The 
second being the communication costs associated with each 
replication policy as the system grows in size. 

A DO's family members will be spread across a collection 
of hosts. A complete description of a DO's position in a 
family structure and the host that it is living on is given by 
the notation DOn,c,/i- Where: 

'^soft ~ preservation copies 
limits : <^hard ~ i^iax. preservation copies 

nmdux = max. DOs 

/^max max. hosts 

n = 1, . . . , nmax 
n, c, h defined as: c = 0, . . . , cj^^rd 
/i = 1 , • • • , hmax 

(n, h) unique V n and V h 
subject to: — / ^ parent DO 

1 > otherwise 
If c> then c < Cgof^ < ^hard' 

4.2 Policies 

We focus on the following preservation policies (assuming 
that the DO values for Cg^^^ and cj^q^j-^j have been defined): 

1. Least aggressive — a DO will make only a single 
preservation copy at a time, regardless of how many 
copies are needed, or how many opportunities are avail- 
able and will continue to make single copies until it 
reaches cj^a^-^j- 

2. Moderately aggressive — a DO will make as many 
copies as it can to reach Cg^^^ when it makes its first 
connection, then fall back to Least aggressive policy. 

3. Most aggressive — a DO will make as many copies 
as it can to reach cj^q^j-^j when it makes its first con- 
nection, then fall back to Least aggressive policy. 

The effect of both the Moderately and Most aggressive 
preservation behaviors is that after reaching their respective 
goals, they behave like the Least aggressive. 

4.3 Evaluation 

Figure [2] serves as a legend for the sub-figures in Figures 
[3] and [4] and shows DO and host preservation status as a 
function of S-^. Figure [2] is divided into four areas. The 
left half shows DO related data, while the right half shows 
host data. DOs are sequentially added to the simulation. 
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Time step 104 of 334. 
501 nodes in tlie systein 



Time step 104 of 334. 
338 liosts used (out of 1000), 121 hosts preserving 
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Figure 2: A snapshot of the Least aggressive preser- 
vation policy. DOs are shown on the left and hosts 
are shown on the right. The colors show the state of 
the DO's preservation copies, or host's preservation 
capacity used at the time of the measurement. Un- 
der each circular plot is a 5*^ histogram. Above each 
circular plot is a status line showing 5*^, how many 
DOs are in the system or how many hosts are active 
and preserving data. 



In Figure [21 DOs are added in a spiral fashion starting at 
the center of the "circular" plot, with newer DOs are plotted 
in a circular manner from the center. This presentation is 
similar to the rings of a tree, the oldest are in center and 
the youngest are on the outer edge. 

The preservation status of a DO is approximated by the 
color assigned to the DO. Initially the DO has c = copies 
and is colored red. As the DO creates copies, the color 
changes to yellow. When the DO reaches Cg^^^, the color 
changes to green. When cj^q^j-^j is reached, the DO turns 
blue. The rules of the simulation (based on our interpre- 
tation of Reynolds' "boids") permit the killing of one DO's 
preservation copies for the sake of creating room for copy 
of a DO that needs to reach its c^q^^ (i.e., if a D02,c,^ has 
more than its c^^fi and DOj,c,h has not reached its Cg^^^, 
then DOi,c,h, will sacrifice one of its copies so that DOj,c,h, 
can move closer to c^q^i)- Sacrificing a preservation copy 
for the betterment of the whole is the embodiment of veloc- 
ity matching. The effect of this behavior is that a DO can 
change color from red to yellow to green and then possibly 
to blue. If the DO changes to blue, it might oscillate be- 
tween green and blue as its number of preservation copies 
oscillate between Cg^^^ and C]^aj-(j. A DO will never sacrifice 
a copy if it has not exceeded its c^^fi • The histogram under 
the DO circular plot shows the percentage of DOs in each 
of the different preservation copy states as a function of S^. 

The preservation utilization status of a host is shown in 
the right half of Figure [2] The universe of possible hosts 
is constant and is represented by the entire right half plot. 
Hosts that are not being used are shown in grey. The place- 
ment of the host in the figure is based on the host's sequen- 
tial number in the simulation. Those hosts that are used 
are drawn in one of five colors. If the host is used in the 
simulation, but is not hosting any preservation copies then 
it is colored white. If less than 25% of the host's capacity is 
used then it is colored red. Similarly, it is yellow if less than 
50% is used, green if less than 75% and blue if greater than 



75%. The histogram on the host's side shows the percentage 
of the hosts that are in any of the particular states. 

In the simulation, each host has a finite amount of storage 
that makes available for DOs that originate from other hosts. 
This storage is called /leap- The simulation has nmax=500, 
'^soft^^' ^hard^ ^' ^max = 1000, /leap = 5. The simulation 
runs until it reaches a steady state. A steady state is defined 
as when the system stops evolving. Evolution stops when 
DOs are unable to locate candidate hosts on which to store 
additional preservation copies. Steady state is reached at 
different times based on the preservation policy. In all cases, 
all nmax DOs have been introduced into the simulation by 
5't = 3500. 

The initial DO is plotted in the center of the left hand 
upper quadrant of each composite. Figure |3(a)| shows the 
first 5 DOs in the system. The one in the center is the oldest 
DO, while the others are younger. The five DOs currently 
in the system live on hosts in the system. Hosts can live 
anywhere on the network and where a particular host is 
drawn immaterial. The hosts in Figure |3(a)| have a finite 
capacity that their respective system administrators have 
allocated to the preservation of copies of "foreign" DOs /leap. 

At any point in time during the simulation, there will 
likely be a difference in the number of preservation copies 
that the DOs want to create and the preservation capacity of 
all the hosts. Reynolds' rules attempt to balance these two 
requirements over time. Figure 3(a) indicates that the DOs 



have each made some number of copies (they are colored 
yellow vice red) and those copies are spread across some of 
the hosts in a non-even manner. One host has used all its 
capacity (as shown in blue), while one has not used any (as 
shown in white). The remaining hosts have used something 
in between those two extremes (they are yellow and red) . In 
Figure 3(a) the histograms do not show too much informa- 



tion because of the initial internal simulation activity prior 
to the introduction of the first DO. 

In Figure [3(b)| the tree ring growth of the DOs is becom- 
ing more apparent. Older DOs have had more opportunities 
to make preservation copies of themselves, therefore there is 
more green and blue in the center of the DO plot. Many of 
the hosts are have reached /leap, as indicated by the num- 
ber of blue hosts. The histograms are starting to become 
filled with data. The DO histogram is starting to show that 
the percentage of the DOs that have made some, but not 
all their preservation copies (those in yellow) is starting to 
grow, while the percentage of those that have reached their 
goals is lessening. The hosts histogram is starting to show 
that the percentage of the hosts that have been discovered 
and added to the system (the grey area), is starting to de- 
crease. A DO will be local to exactly one host. A host 
may have more than one DO local to it. A DO will not 
put a preservation copy on any host that it lives on, or that 
already has a pr eservation copy of itself. 

In Figure [3(c)] the tree ring presentation of the DO success 
at preservation is becoming more pronounced. Younger DOs 
are struggling to make copies, while the old ones are main- 
taining their copies. More of the hosts are being brought 
into the system (the percentage of grey hosts is decreasing) , 
but a significant percentage of the hosts are not being used 
for preservat ion (t hose shown in white). 

In Figure |3(d)[ all DOs have been introduced into the 
system. The tree ring preservation effect is still evident, 
and some of the new DOs have been fortunate enough to 
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(a) S't= 1500. 



Time step 50 of 334. 
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(b) S't= 1700. 





(c) 5't= 2200. 



(d) 5't= 3500. 



Figure 3: The growth of a nmax = 500 DO system captured at various time-steps. The left half of each 
sub-figure shows the "tree ring" growth of the DO's portion of the system. The DO and host histograms 
show the percentage of DO and hosts that are in their respective states as a function of time. All DOs have 
been created and assigned to a host by S-^ — 3500. 



make some number of preservation copies (as shown by the 
yellow markers in the sea of red). The percentage of hosts 
that are still not preserving any DOs is still significant, and 
the percentage of hosts that have reached heap is holding 
constant. The system will continue to evolve until it reaches 
a steady state, when those DOs that have preserved as many 
copies of themselves as they can based on their knowledge 
of hosts that have excess preservation capacity. The s ystem 
steady state for this particular graph is shown in Figure [4(a)] 
Figure H] shows the steady state condition of the same 
system using the three different preservation policies. All 
DOs have been introduced into the system by Si = 3500 
(as shown by the "kink" in the percentage of hosts that 
are used histogram). Each preservation policy resulted in 
a significantly different time to reach a steady state. The 
hosts have enough preservation capacity to accommodate 
the preservation needs of the DOs, a Boundary High condi- 
tion (see Table[T]). If the DO can locate enough unique hosts 
via its friends, then it will be able to meet its preservation 
goals. These representative values for number of DOs, de- 
sired preservation levels and host preservation capacity were 
chosen to illustrate the interaction between the DOs as they 
move preservation copies from one host to another while at- 
tempting to maximize the preservation needs of most of the 
DOs. 

The Least aggressive policy reaches steady state at = 
8195 (see Figure 4(a) ) and a significant percentage of the 
DOs have not been able to make any preservation copies (as 
shown by the lower-most (red) band in the histogram). As 



shown in the node half of the figure, many of the hosts are 
not preserving any DOs and those hosts that are preserving 
have reached their capacity. 

The Moderately aggressive policy reaches steady state at 
Si = 12599 (see Figure |4(b)| ). Prior to S^ = 3500, most of 
the DOs have made most of their preservation copies. After 
that time, the percentage achieving cj^q^j-^j slowly increases 
until the system reaches steady state. The hosts' preserva- 
tion capacity is used by the DOs in the system almost as 
quickly as the hosts come on line. This is indicated by the 
very narrow white region between the unused host region 
and the totally used region. At steady state, only a very 
few of the hosts have not been totally used (as shown by the 
few host usage squares that are neither blue or grey). 

The Most aggressive policy reaches steady state after S^ = 
7521 (see Figure |4(c)[). Close examination of the host his- 
tograms in Figures 4(b) and 4(c) show almost identical be- 
havior both prior to S^ = 3500 and at steady state. Compar- 
ing the host usage plot in the two figures show that slightly 
more hosts have unused capacity based on a Most aggressive 
policy than a Moderately aggressive policy (390 versus 397). 
Based on nmax DOs in the system, the difference between 
the two policies host under utilization is not significant. 

4.4 Communications 

From the DO's perspective, there are two distinct phases 
of communication. The first is when the DO is wander- 
ing through the graph and collecting information from DOs 
that are already connected into the graph, this called the 
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Time step 334 of 334. 
501 nodes in the system 



Time step 334 of 334. 
398 liosts used (out of 1000), 263 liosts preserving 
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(a) Least aggressive preservation policy. System stabilization at 5*^= 8195. 



Time step 554 of 554. 
501 nodes in the system 



Time step 554 of 554. 
398 hosts used (out of 1000), 397 hosts preserving 
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(b) Moderately aggressive preservation policy. System stabilization S^- 
12599. 



Time step 300 of 300. 
501 nodes in the system 



Time step 300 of 300. 
398 hosts used (out of 1000), 390 hosts preserving 
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(c) Most aggressive preservation policy. System stabilization at 7521. 



Figure 4: Time lapsed comparison of different preservation policies. Using the Most aggressive policy results 
in a higher percentage of DOs meeting their preservation goals sooner and makes more efficient use of limited 
host resources sooner. 
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Figure 5: Showing total messages sent and received by an early DO, a mid-simulation Do and all DOs. The 
shape of the message s ent curve s (in black) for the early node is different based on the preservation policy 
(see Figures |5(a)[ [5(d)| and 5(g)). While the shape of messages received curve (in red) rem ain s alm ost the 
same. This behavior is contrasted with the mid-simulation node (see Figures |5(b)[ |5(e)| and 5(h)). The 
mid-simulation node message sent curve is constant regardless of the preservation policy. The growth and 
maintenance phases are shown in light blue and light green respectively. 
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growth phase. The second is after the DO is connected into 
the graph and is cahed the maintenance phase. During the 
growth phase, the DO is actively communicating with other 
DOs. While in the maintenance phase, the DO is respond- 
ing to queries and communications from other DOs. This 
change in communication patterns occurs at approximately 
Si = 3500 in Figure [5l Figure [5] shows the communications 
for 2 different DOs and the system in total as a function of 
the preservation policy. DOi,c,/j, and D025o,c,h, were chosen 
to represent the messaging profiles of all DOs to see if the 
profile changes as a function of when a DO enters the sys- 
tem. Time in Figure [5] runs until = 15000 and messages 
are counted in time bins sized to 100 simulation events. 

Looking at figures [5(a)] [5(b)| [5(d)| [5(e)] [5(i)] and [5(ti)] 
there is a marked difference in the communication curves 
between DOi,c,/j, and D025o,c,h,- These curves (with only 
minor differences) are consistent across all preservation poli- 
cies. DOi,c,/i (the earliest DO introduced into the system), 
sends a rather modest number of messages 0{2n) to DOs 
that are also in the system as DOi,c,h attempts to create 
preservation copies. Under the least aggressive policy (see 
Figure 5(a) ), DOi,c,/i sends a few messages per time bin un- 
til the system enters the maintenance phase. The number 
of messages sent during the moderately aggressive policy is 
nominally the same (see Figure |5(d)[ ) . While the Most ag- 
gressive policy results in messages for just a couple of time 
bins and then virtually no messages are sent (see Figure 
|5(g)| . Regardless of the preservation policy, the number of 
messages that DOi,c,/i receives is about the same. 

Comparing the message curves for DOi,c,/?, and DO250, c,h 
indicates that the system discovered by the later DO is very 
different than the one discovered by the earliest DO. The 
late arriving node has more than enough opportunities to 
satisfy its preservation goals when first introduced into the 
system. DO250, c,h sends all of its messages in one time bin 
and virtually nothing thereafter. This behavior is constant 
across all preservation policies and indicates that the late 
arriving DOs are able to connect with another DO in very 
short order and almost immediately enter into the main- 
tenance phase of their existence. The maintenance phase 
of the system corresponds to a combination of the velocity 
matching and flocking centering. 

The system is in a growth phase from about = 1500 
to = 3500 as shown by the rising curves in the "Sum of 
all DOs" sub-figures [5 (c) I [5(f)| and |5(i)[ During the growth 
phase, the wandering node is sending and receiving a lot 
of messages while attempting to make its initial connection 
into the graph. After S-^ = 3500, the system is in a main- 
tenance phase when the system is attempting to balance 
the preservation needs of the DOs with the capacity of the 
hosts. Comparing the messages curves for the entire system 
Figures [5 (c) [ [5 (f ) [ and [5 (i) [ shows that there is no qualitative 
difference between the number of messages sent and received 
in the system based on preservation policy. The nuances of 
the message curves for early DOs is lost as the size of the 
system increases. 

4.5 Messages Sent and Received as the System 
Grows in Size 

Figures|3]and[5]show the efficacy and communication costs 
associated with a system with nmax = 500 and /imax = 
1000. These values allowed the simulation to execute quickly, 
thereby enabling more options and combinations to be in- 




O.O&+OO 5.0e+06 1.0&+07 1.5&1-07 

Total messages used 



Figure 6: The preservation effectiveness as a func- 
tion of policy and number of messages sent and re- 
ceived. 



vestigated. After determining that at least a moderately 
aggressive preservation policy enabled a high percentage of 
DOs to meet at least their Cg^^^ goals, the next area of in- 
vestigation was to determine how the total number of mes- 
sages changes as a function of system size. Figure [5] clearly 
shows that there different types of communication during 
the growth and maintenance phases. During the mainte- 
nance phase, the DOs are attempting to spread their preser- 
vation copies out across all the unique hosts in their friend's 
network. A cost function was developed to quantitatively 
investigate the performance of the various preservation poli- 
cies focusing on the number of messages sent and received. 
Each preservation status (see Figure [2]) was assigned a value 
from 1 to 4 corresponding to the range to cj^^^^^^ and scaled 
to 1. At each S-^ the cost performance of the system was 
evaluated and the number messages sent and received up to 
that point was summed. The results are shown in Figure 
[6l The Most aggressive and moderately aggressive policies 
achieve approximately the same level of effectiveness, but 
the Most aggressive achieves that level twice as fast as the 
moderately aggressive and only sending half as many mes- 
sages. 

The current simulation is a Boundary High condition (see Ta- 
ble [1]). One of the contributing factors to spreading preser- 
vation copies across many hosts is the limited capacity of 
the hosts to support preservation. In order to remove the 
effects of maintenance communications and focus purely on 
the effect of the number of DOs in the system, a series of 
simulations were run using a Feast condition environment 
(see Table [Tj) where heap = 2 * nmax- This ensured that 
there would be room on the host for any DO that discovered 
the host via one of their friends. Based on the simulations, 
the total number of messages exchanged during the growth 
phase approximates O(n^) and the incremental messaging 
cost of each new DO to the system is 0(2n). 

5. CONCLUSION 

We have shown that implementing Reynolds' "bold" model 
with a limited number of rules in an autonomic system can 
result in digital objects (DOs) behaving in a manner that 
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works towards the betterment of the whole by occasionally 
sacrificing an individual. Using simulations, we investigated 
different policies that DOs could use when make preservation 
copies of themselves. Based on simulations of 500 DOs and 
hosts with limited preservation capacity; the Most aggres- 
sive preservation policy enabled the DOs to attain the same 
preservation percentage in half the time as a Moderately ag- 
gressive policy while exchanging only half as many messages. 
An aggressive policy will try to make up to cj^^j-j copies as 
it can at its first opportunity and then single copies there- 
after. A Moderately aggressive preservation policy will try 
to make up to c^^fi copies at its first opportunity and then 
single copies thereafter until it reaches cj^q^j-^j- The least ag- 
gressive preservation policy attempts to make 1 preservation 
copy per opportunity until it reaches cj^^rd- 

There are two distinct communication message profiles; 
one prior to all the DOs being introduced into the system 
and one after. The system's growth period is characterized 
by many messages being sent from the wandering DO and 
few being received while the DO attempts to make its ap- 
propriate number of preservation copies. The maintenance 
period is characterized by a relatively few number of mes- 
sages as the DO is directed to sacrifice preservation copies 
for the greater good of the graph, and subsequently having 
to create copies anew. There are distinct differences between 
the growth message profiles of new and late arriving DOs, 
based on the preservation policy. 
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Named host capacity 

(condition) 


Preservation policy 


Least aggressive 


Moderately aggressive 1 Most aggressive 


Famine 

(cgoft < cj^ard < ^cap) 

Boundary Low 

{heap = c^oft < cj^ard) 


Lowest percentage of 
DOs achieving 
preservation goal. 


Equally marginally 
effective. 


Straddle 

(Soft ^ ^cap < cj^ard) 

Boundary High 

(cgoft < cj^ard = ^cap) 

Feast 

{heap < Cgof^ < cj^g^j.^) 


The baseline against 
which others are 
measured. 


This preservation 
policy is twice as 
efficient as the 
Moderately 
aggressive policy for 
these named 
conditions. 



Table 1: The effectiveness of various preservation policies based on named host capacity conditions. In this 
table we have taken the liberty to abuse the definitions of /leap, c^oft *^hard interpreting them to 

apply to the total system, vice a single host or DO. In all cases, the Least aggressive policy was the least 
successful at meeting the system's preservation goals. Under the Famine and Boundary Low conditions, when 
it would be impossible to meet preservation goals, both the Moderately and Most aggressive policies arrived at 
approximately the same steady state situations after exchanging approximately the same number of messages. 
Straddle conditions would permit some DOs to achieve their goals, if the DOs were fortunate. Straddle results 
under Moderately and Most aggressive policies are comparable and the Most aggressive reaching steady state 
after exchanging about | as many messages as the Moderately aggressive policy. Boundary High and Feast 
conditions have enough capacity for the system to meet its preservation needs, and the Most aggressive policy 
operates about twice as efficiently as the Moderately aggressive policy. 
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